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The inference of the couplings of an Ising model with given means and correlations is called inverse 
Ising problem. This approach has received a lot of attention as a tool to analyze neural data. We 
show that autoregressive methods may be used to learn the couplings of an Ising model, also in 
the case of asymmetric connections and for multi-spin interactions. We find that, for each link, the 
linear Granger causality is two times the corresponding transfer entropy (i.e. the information flow 
on that link) in the weak coupling limit. For sparse connections and a low number of samples, the £i 
regularized least squares method is used to detect the interacting pairs of spins. Nonhnear Granger 
causality is related to multispin interactions. 
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I. INTRODUCTION 

, A large of neurons is involved in any computation, and the presence of non-trivial correlations makes understanding 

, the mechanisms of computation in brain a difficult challenge [ij. The simplest model for describing multi-neuron 
spike statistics is the pairwise Ising model 0, 0] . The inference of the couplings of an Ising model from data, the 
inverse Ising problem, has recently attracted attention, see [?] where data from a simulated cortical network were 
• considered; the general idea is to find an Ising model with the same means and pairwise correlations as the data. 
qh| Several approximate methods can be used, like that by Sessak and Monasson Q or the inversion of TAP equations 
Q : equal-time correlations from data are used in those methods . The flow of information between spins is related, 
instead, to correlations at different times between variables: it can be expected that measuring the information flow 
between spins one may improve the estimate of couplings from data. 
' Two major approaches are commonly used to estimate the information flow between variables, transfer entropy Q 
and Granger causality [9]. Recently it has been shown that for Gaussian variables Granger causality and transfer 
CnI • entropy are entirely equivalent as the following relation holds: Granger causality — 2 Transfer Entropy. This result 
[ provides a bridge between autoregressive and information-theoretic approaches to causal inference from data [l^] ■ 
. The purpose of this work is to explore the use of Granger causality to learn Ising models from data. The inverse 
Ising problem is here seen as belonging to the more general frame of the inference of dynamical networks from data, 
, a topic which has been studied in recent papers [TT| - [l^ : its relevance is due to the fact that dynamical networks [15| 
' model physical and biological behavior in many applications [l^. We show that for weak couplings, the linear Granger 
^ , causality of each link is two times the corresponding transfer entropy, also for Ising models: this occurrence justifies 
the use of autoregressive approaches to the inverse Ising problem. In the same limit, for each link, the following 
relation exists between the coupling (J) and the causality {6): S = J^. In the case of limited samples. Granger 
r causality gives poor results: almost all the connections are not assessed as significative for low number of samples. 
■ ■ In these cases, we propose the use of £i least squares method [T7| . a penalized autoregressive approach tailored to 
embody the sparsity assumption, to recover the non -vanishing connections of a sparse Ising model; as expected the 
£i approach outperforms Granger causality in this case. Finally we show that nonlinear Granger causality is related 
to multi-spin interactions. 

The paper is organized as follows. In the next section we briefly recall the notions of Granger causality and transfer 
entropy, and we also describe the Ising models that we use for simulations. In section III we describe our results on 
fully connected models, sparse Ising models and models with higher order spin interactions. Some conclusions are 
drawn in section IV. 
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II. GRANGER CAUSALITY AND TRANSFER ENTROPY 

In this section we review the notions of Granger causahty analysis and transfer entropy. We also discuss the 
application of these methods to binary time series arising from Ising models. 

A. Granger causality 

Granger causality has become the method of choice to determine whether and how two time series exert causal 
influences on each other [isl ]. It is based on prediction: if the prediction error of the first time series is reduced by 
including measurements from the second one in the linear regression model, then the second time series is said to 
have a causal influence on the first one [l^. The estimation of linear Granger causality from Fourier and wavelet 
transforms of time series data has been recently addressed \2d\. The nonlinear generalization of Granger causality 
has been developed by a kernel algorithm which embeds data into a Hilbert space, and searches for linear Granger 
causality in that space (2ll |: the embedding is performed implicitly, by specifying the inner product between pairs of 
points '23], and a statistical procedure is used to avoid over- fitting. 

Quantitatively, let us consider n time series {xa{t)} a=i,....n [23]; the lagged state vectors are denoted 

Xa{t) = {Xa{t - m), . . .,Xa{t - 1)) , 

m being the window length. Let e (xq,|X) be the mean squared error prediction of on the basis of all the vectors X 
(corresponding to the kernel approach described in ^]): e (a;Q|X) is equal to 1 — xJxq, where x^, the predicted values 
of Xq, using X, is the projection of Xq, on a suitable space H. The prediction of Xq, on the basis of all the variables 
but Xp, e (aJajX \ Xp), corresponds instead to the projection on a space H' with H — H' (B . represents the 
information that one gains from the knowledge of Xp. The multivariate Granger causality index J(/3 — > a) is defined 
as the (normalized) variation of the error in the two conditions, i.e. 

e (XqIX \ Xi3) 

Note that the numerator, in the equation above, coincides with the projection of x^ on H^: as described in [isj . one 
may write 



S{l3^a)^yrl (2) 



where are suitable Pearson's correlations. By summing, in equation ([2]), only over significative correlations, a filtered 
linear Granger causality index is obtained which measures the causality without further statistical test. 

In [2^ it has been shown that not all the kernels are suitable to estimate causality. Two important classes of kernels 
which can be used to construct nonlinear causality measures are the inhomogeneous polynomial kernel (whose features 
are all the monomials, in the input variables, up to the p — th degree; p = I corresponds to linear Granger causality) 
and the Gaussian kernel. Note that in [lOi] a different index of causality is adopted: 

A(/3 ^ a) = log = _ log [1 - 5{(3 «)]; (3) 

e [Xa \-^) 

A and S coincide at small 6. The choice of the window length m is usually done using the standard cross-validation 
scheme [23 |: as in this work we know how data are generated, here we use m ~ 1. 

The formalism of Granger causality is constructed under the hypothesis that time series assume continuous values 
Xa ■ In recent papers the application of Granger causality to data in the form of phases has been considered f26| l2^ . 
Even though there is not theoretical justification in the case of binary variables, here we apply the formalism of 
Granger casuality to n binary time series {<Jo.(t) — ±1}q=i „, by substituting 

Xa{t) aa{t) 

and 

^a(i) ^fT„(t-l) = S„(i); 

in this work we justify the application of Granger causality to binary time series in terms of its relation with transfer 
entropy. 
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B. Transfer entropy 

Using the same notation as in the previous subsection, the transfer entropy index Te(/3 — > a) is given by Q 

TeW -^a)= fdx^ fdXp X) iog P(^^|X\^/^) ^ 

J J p{Xa\-^j 

and measures the flow of information from 13 to a. For Gaussian variables it has been shown [lo| that causality is 
determined by the transfer entropy and A — 2Te; hence 6 = 1 — e^^^^ for Gaussian variables. The probabilities 
p's, in must be estimated from data using techniques of nonlinear time series analysis (28j . In the case of binary 
variables {ctq = il}, the number of configurations is finite and the integrals in (|4]) become sums over configurations; 
the probabilities can be estimated as frequencies in the data-set at hand. Therefore 

(T„=±lSi = ±l E„ = ±l y\ ) f\ \ PI 

where p (S) is the fraction of times that the configuration S is observed in the data set, and similar definitions hold 
for the other probabilities. We remark that the number of configurations increases exponentially as the number of 
spins grows, hence the direct evaluation of ([5]) is feasible only for systems of small size. 



C. Ising models 

The binary time series analyzed in this work are generated by parallel updating of Ising variables {cra}a=i, 

p(a„W = +llS(t)) = ^-j-l^, (6) 

where the local fields are given by 

n 

ho.{t) = ^J^pap{t-\) (7) 

/3=1 

with couplings Jap- Starting from a random initial configuration of the spin, equations (|6]) are iterated and, after 
discarding the initial transient regime, N consecutive samples of the system are stored for further analysis. 



III. ANALYSIS OF ISING MODELS 



A. Fully connected models 

In order to generalize Granger causality to discrete variables, we consider the regression function of (jB]). For weak 
couplings the conditional expectation (which coincides with the regression function [sdj ) can be written 

n 

= ^ sp(crQ = s|S) = tanh(/iQ) - ^ Ja/3S/3, (8) 

s=±l fi=\ 

and is a linear function of the couplings. Therefore the linear causality is 8{Ji — >■ a) ~ J^^ at the lowest order in J's. 
Analogously, expanding eq.® at the lowest order, the transfer entropy reads Tb(/? a) ^ Jafil'^- This means that, 
for low couplings, the value of J, for any given link, determines both the transfer entropy and the linear Granger 
causality for that link, and the two quantities differ only by the factor 2. The same relation, proved in the Gaussian 
case, holds also for Ising models at weak couplings. Being related to the transfer entropy. Granger causality thus 
measures the information fiow for these systems, and this justifies the use of Granger causality methods for Ising 
systems. Synaptic couplings are directed, so Jap is not in general equal to JjSa (the equilibrium Ising model requires 
symmetric couplings |29j). Therefore we consider an asymmetric system of spins with couplings Jap chosen at random 
from a normal distribution with zero mean and standard deviation Jq; no self interactions are assumed {Jaa = 0). In 
figure ([T]) we report the plot of the numerical estimates of linear causality and transfer entropy, as a function of J, 
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for several realizations of the couplings and for some values of Jq; the simulations confirm that for low couplings a 
one-one correspondence exists between causality and transfer entropy. 

In figure ([2]) we depict, as a function of the coupling J, both the linear causality and the transfer entropy in a 
typical asymmetric model of six spins with N = 100, 1000 and 10000 samples, and Jq — 0.2. In figure ([3]) we depict, 
as a function of the number of samples TV, the difference between the values of transfer entropy and causality (as 
estimated on 10^ samples) and their estimates based on N samples (the difference is averaged over 1000 runs of 
the Ising system): at low N the estimates of causalities are more reliable than those of transfer entropy. Moreover, 
the computational complexity of the estimation of transfer entropy is much higher than those corresponding to the 
evaluation of Granger causality. 

Another interesting situation is all the couplings being equal to a positive quantity J (still, without self-interactions). 
In figure Q we depict the Granger causality and the transfer entropy as a function of J (these quantities are the same 
for all the links, due to symmetry). For small values of J both quantities coincide with the values that correspond to 
J in the asymmetric model, and the relation S/Te — 2 holds. On the other hand, as J increases, this relation is more 
and more violated. The departure of the ratio 6/Te from 2 is connected to the emergence of feedback effects in the 
system, see e.g. the dependency, on J, of the auto-correlation time of the magnetization J2^=i in fig.©. 



B. Sparse models 

In many applications one may hypothesize that the connections among variables are sparse. The main goal, in 
those cases, is to infer the couplings which are not vanishing, independently of their strengths, in particular when 
the number of samples is low. Moreover, in the case of limited data. Granger causality gives poor results; indeed 
almost all connections would not be assessed as significative (for a given amount a data, only couplings stronger than 
a critical value can be recognized by Granger causality (2]|). 

A major approach to sparse signal reconstruction is the £i regularized least squares method Although it has 
been developed to handle continuous variables, we will apply this method to the configurations of Ising models. For 
each target spin CTq, the vector of couplings Aa^, with /? = 1, . . . , n, is sought for as the minimizer of 

51 ^'^"W-I^^^/^^^wj +A||A,^||i, (9) 

where A > is a regularization parameter and ||Aq,^||i = X]/3=i l^a^l ^^e ii norm of the vector of couplings. As 
A is increased, the number of vanishing couplings in the minimizers increases: A controls the sparsity of the solution 
[Slj . The strategy to fix the value of A we use here is 10-fold cross-validation [^J]: the original sample is randomly 
partitioned into 10 subsamples and, out of the 10 subsamples, a single subsample is retained as the validation data. 
The remaining 9 subsamples are used in ([9]) to determine the couplings A; the quality of this solution is evaluated 
as the average number of errors on the validation data. The cross-validation process is then repeated 10 times (the 
folds), with each of the 10 subsamples used exactly once as the validation data, and the error on the validation data 
is averaged over the 10 folds. The whole procedure is then repeated as A is varied. The optimal value of A is chosen 
as the one leading to the smallest average error on the validation data. 

As an example, we simulate a system made of 30 spins constituted by ten modules of three spins each. The 
non- vanishing couplings of the Ising model are given by: 



J(3i-2,3i-l) = 0.2, 
J{3i-l,3i) = 0.2, 

J(3i,3i-2) = - 0.2, ^^"^ 
J(3i,3i-1) = 0.2, 

for i = 1,2,..., 10. After evaluating the couphngs A^^, using the algorithm described in [31|], we calculate the 
sensitivity (fraction of non- vanishing connections J leading to non- vanishing couplings A) and the specificity (fraction 
of vanishing connections J leading to vanishing couplings A) as a function of A. The ROC curves (as A is varied, the 
ROC curve is sensitivity as a function of 1-specificity 32]) we obtain, in correspondence to three values of the number 
of samples N (100, 250 and 500), are depicted in fig. The stars on the curves represent the points corresponding 
to the value of A found by ten-fold cross validation; these points correspond to a good compromise between specificity 
and sensitivity. The empty symbols, instead, represent the values of sensitivity and specificity obtained using Granger 
causality in the three cases; the specificity by Granger causality is nearly one in all cases, while the sensitivity is 
strongly dependent on the number of samples and goes to zero as iV decreases. To conclude this subsection, we have 
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shown that in the case of low number of samples and sparse connections the £i regularized least squares method can 
be used to infer the connections in Ising models and outperforms Granger causality in these situations. We remark 
that direct evaluation of the transfer entropy in these cases is unfeasible. 



C. Higher order spin interactions 



The case of higher order spin interactions requires use of nonlinear Granger causality: in the presence of p— spins 



interactions, the kernel approach 21| with the polynomial kernel of at least p — 1 degree is needed. As an example, 



we consider a system of three spins with local fields given by: 

hi{t) = jCT2(i-l)a3(t-l), 

h2{t) - 0.5a3(t-l), (11) 

h3{t) = 0.5(72(^-1). 

In figure ([7]) we depict the causalities (5(2 1) and (5(2 — 3), as a function of J, using the linear kernel and 
for the approach with the p = 2 polynomial kernel. Note that, due to symmetry, (5(3 — >■ 1) = (5(2 1) and 
(5(2 — 3) = S{3 — ?► 2); all the other causalities are vanishing. The linear approach is not able to detect the three spins 
interaction, while using the nonlinear approach the interaction is correctly inferred. We stress that the presence of 
multispin interactions is connected to the presence of synergetic variables, see [33| for a discussion about the notions 
of redundancy and synergy in the frame of causality. 

It is interesting to show the performance by transfer entropy on the same problem, see fig.®: it correctly detects all 
the interactions, and the value of the transfer entropy is again very close to be half of those from nonlinear Granger 
causality. We stress that transfer entropy can be applied without prior assumptions about the order of the spins 
interactions. A major problem in the inference of dynamical networks is the selection of an appropriate model; in the 
case of transfer entropy this issue does not arise, although this advantage may be offset by problems associated with 
reliable estimation of entropies in sample. 



IV. CONCLUSIONS 



We have proposed the use of autoregressive methods to learn Ising models from data. Commonly, the formulation 
of the inverse Ising problem assumes symmetric interactions and is solved by exploiting the relations that exist, at 
equilibrium, between the pairwise correlations (at equal times) and the matrix of couplings. In the general case of 
asymmetric couplings, no equilibrium is reached and also time delayed correlations among spins should be used to 
infer the connections. We have shown that autoregressive approaches can solve the inverse Ising problem for weak 
couplings: for each link | Ja^| = a/ 5{(3 a), whilst the sign of J coincides with the sign of the linear correlation 
between ctq and E/j. For weak couplings. Granger causality is proportional to the transfer entropy and requires less 
samples, than transfer entropy, to provide a reliable estimate of the information fiow. For sparse connections and 
low number of samples, the £i regularized least squares method is preferable to Granger causality; nonlinear Granger 
causality is related to multispin interactions. 

The authors thank Amos Maritan and Marco Zamparo (University of Padova) for valuable discussions. 
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FIG. 1: (Top) The estimates of linear Granger causality and transfer entropy, for each link, are plotted versus the coupling 

J. The points, corresponding to 15 realizations of the couplings of six-spins Ising systems with Jo ranging in [0.1,0.2], are 
displayed (the two quantities arc estimated over samples oi N = l(f length). The curves are the quadratic expansions at weak 
coupling; 5 — (dashed-dotted line) and Te = J^/2 (continuous line). (Bottom) The same points are displayed in the 6 — Te 
plane, showing that 5 = 2Te at weak coupling. 
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FIG. 2: For a typical asymmetric Ising model of six spins and Jo = 0.2, the linear Granger causality (right) and the transfer 

entropy (left) arc depicted for each link, as a function of its coupling J, for A'^ — 100 (top), 1000 (middle) and 10000 (bottom) 
samples. The continuous curves represent the true values (obtained by fitting the points in fig. 1). 




FIG. 3: The asymmetric Ising model described in figure 2 is here considered. Calling ta the true value of the transfer entropy 
and tt its estimate based on N samples, averaged over 1000 runs of the Ising system, we define E — \tb — ta\/ta- The quantity 
E, thus obtained, is here plotted versus N (empty circles). A similar quantity E, concerning Granger causality, is also plotted 
(stars). 



8 



0.15 




0.05 



FIG. 4: The homogeneous Ising model, with n = 6 and uniform coupUngs J > 0, is considered. As a function of J, the Hnear 

Granger causahty (stars) and the transfer entropy (empty circles) are depicted versus J. Both quantities are the same for all 
links, due to symmetry. The two curves are the relations between the coupling and transfer entropy (and between coupling 
and causality) which hold for the asymmetric Ising model (obtained by fitting the points of fig. 1). 
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FIG. 5: The auto-correlation time of the magnetization for the homogeneous Ising model analyzed in figure 4. 
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FIG. 6: The ROC curves (sensitivity vs 1-specificity) for the detection of non-vanishing couplings in the sparse system of 30 

spins described in the text; the curves correspond to three values of the number of samples N, 100 (dashed line), 250 (dotted 
line) and 500 (continuous line). The stars on these curves represent the points found by ten- fold cross validation. The other 
three symbols are the performances by Granger causality on AT = 100 (empty diamond), N = 250 (empty circle), N = 500 
(empty square). 
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FIG. 7: The causalities 5{2 1) and 5{2 —¥ 3) axe depicted as a function of J for the three spins system described in the text. 
Causalities are estimated using the linear kernel (top) and the p = 2 polynomial kernel (bottom). 
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FIG. 8: The transfer entropies Te{2 1) and Te{2 3) are depicted as a function of J for the three spins system described 
in the text. 



